MlfflBM 


HHBSB! 
HflBolffin 


JHfl 


■83 


■HHKaiBGfitBS 


H  II 


LIBRARY  OF  THE 

UNIVERSITY  OF  ILLINOIS 

AT  URBANA-CHAMPAIGN 

5/0.84 

no.  S9S-&O0 
cop.  2. 


..^.Report  No.  UIUCDCS-R-73-596 


yyi^i^ 


A  GENERALIZED  LEXICAL  SCANNER  FOR  A  TRANSLATOR  WRITING  SYSTEM 


by 


Albert  Cannon  Baker,  Jr, 


October  1973 


■  \\ 


DEPARTMENT  OF  COMPUTER  SCIENCE 
UNIVERSITY  OF  ILLINOIS  AT  URBANA-CHAMPAIGN 


URBANA,  ILLI 


Report  No.  UIUCDCS-R-73-596 


A  GENERALIZED  LEXICAL  SCANNER  FOR  A  TRANSLATOR  WRITING  SYSTEM* 


by 
Albert  Cannon  Baker,  Jr, 


October  1973 


Department  of  Computer  Science 
University  of  Illinois  at  Urbana-Champaign 
Urbana,  Illinois  61801 


This  work  was  supported  in  part  by  the  National  Science  Foundation  under 
Grant  No.  US  NSF-GJ-328  and  was  submitted  in  partial  fulfillment  of  the 
requirements  for  the  degree  of  Master  of  Science  in  Computer  Science, 
October  1973. 


Digitized  by  the  Internet  Archive 
in  2013 


http://archive.org/details/generalizedlexic596bake 


A  GENERALIZED  LEXICAL  SCANNER  FOR  A  TRANSLATOR  WRITING  SYSTEM 


Albert  Cannon  Baker,  Jr.,  M.S. 
Department  of  Computer  Science 
University  of  Illinois  at  Urbana-Champaign,  1973 


This  is  an  expository  paper  that  is  concerned  with  a  Lexical 
Scanner  for  a  translator  writing  system  that  has  been  in  use  at  the  Univer- 
sity of  Illinois.  Its  significant  features  include  a  structured,  binary- 
tree  symbol  table,  a  parameterized  macro  expander,  and  a  compile-time 
flexibility  for  assigning  characters  that  make  up  the  basic  terminal 
symbols.  A  comprehensive  example  of  the  scanner's  operation  is  also  in- 
cluded. 
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1.  INTRODUCTION 

1.1  The  Translator  Writing  System 

One  of  the  software  efforts  that  was  undertaken  for  the  Illiac  IV 
project  at  the  University  of  Illinois  at  Urbana-Champaign  was  the  development 
of  a  Translator  Writing  System  which  permits  the  implementation  of  the  various 
compilers;  in  particular,  the  compilers  for  the  Illiac  IV  problem-oriented 
languages  TRANQUIL  [1]  and  GLYPNIR  [2]  are  the  prime  examples.  The  main 
components  of  the  TWS  are: 

a)  syntax  meta-languages  (TWINKLE  [7]  and  TBNF  [4]),  which 
are  extensions  to  Backus-Naur  Form  [8,  9],  and  in  which 
the  syntax  of  a  programming  language  £  is  specified; 

b)  a  semantics  meta-language  (Illinois  Semantics  Language  ISL 
[5])  which  is  an  extension  of  Burroughs  Extended  Algol  that 
includes  special  constructs  to  manipulate  tables,  stacks 
and  generate  object  code;  and  in  which  the  semantics  of  a 
programming  language  £  is  specified; 

c)  a  basic  core  for  each  translator  consisting  of  the  lexical 
scanner,  either  the  skeleton  parser  for  a  direct  parsing 
algorithm  translated  into  Algol  code  or  a  complete  table- 
driven  parser  for  an  interpretive  parsing  algorithm  [4], 

and  miscellaneous  auxiliary  procedures  which  are  independent 
of  the  source  and  object  languages  of  the  translator; 

d)  a  system  consisting  of  syntax  preprocessors  to  generate 
either  the  tables  for  the  interpretive  parser  or  a  set  of 
Algol  source  statements  that  will  parse  the  language 


specified;  the  ISL  translator  to  translate  the  ISL  ex- 
tensions into  Algol;  a  program  to  generate  from  the  out- 
puts of  the  syntax  preprocessor,  the  ISL  translator,  and 
the  basic  core  a  complete  Algol  program  which,  when 
compiled  by  the  standard  Algol  compiler,  will  be  the  re- 
quired translator  for  the  language  £  as  specified  by  the 
syntax  and  semantics. 

1 .2  The  Lexical  Scanner 

The  function  of  a  lexical  scanner  in  a  compiler  is  to  scan  charac- 
ters from  a  source  program,  combining  one  or  more  of  them  together  to  form 
single  terminal  symbols  when  the  syntactic  recognizer  (parser)  makes  a  request 
for  a  new  symbol.  As  far  as  the  TWS  and  the  scanner  are  concerned,  the 
following  symbols  are  deemed  to  be  terminal  symbols: 

a)  special  single  characters  ($,:-+=,  etc.); 

b)  key  words  in  the  language,  including  either  reserved 
identifiers  or  special  identifiers  for  which  a  special 
character  (nominally  "#")  precedes  the  identifier  (BEGIN, 
#ELSE,  etc.); 

c)  identifiers  (X,  CASHIN1STNATI0NALBANK,  etc.); 

d)  string  and  numeric  literals  ("THIS  IS  A  STRING",  3.14159, 
@  234,  etc.). 

Internal  to  the  scanner  is  a  powerful  parameterized  text- type  [3] 
macro  expander  which  has  the  capability  to  recognize  and  store  declarations  of 
defined  identifiers,  and  to  regurgitate  the  stored  text  when  the  identifier  is 
used  subsequently.  This  facility  is  transparent  to  the  syntactic  recognizer 
and,  except  for  block  structure  considerations,  is  transparent  to  the  semantics 


as  well.  Section  2.4  contains  a  discussion  of  the  data  structure  of  the 
macros,  and  Appendix  A  includes  a  discussion  of  the  syntax  and  semantics  of 
the  macro  generator. 

The  symbol  table  used  by  the  scanner  is  a  straightforward  binary- 
tree  structure,  with  disjoint  trees  for  the  several  terminal  symbol  classes 
interleaved  within  the  same  table.  Binary  Coded  Decimal  (BCD)  information 
is  stored  in  this  table  packed  six  characters  per  48-bit  Burroughs  B-5500 
word.  Section  2.2  contains  a  complete  description  of  the  symbol  table. 

There  is  much  flexibility  built  into  the  scanner  to  make  the  re- 
sultant compilers  both  more  general  and  easier  to  use.  Through  control  card 
options,  the  user  of  the  compiler  may  specify  that  non-standard  symbols  be 
used  to  define  the  terminal  classes,  such  as  using  "8"  instead  of  "@"  to  mark 
the  exponent  part  of  a  numeric  literal,  or  that  the  key  words  in  the  language 
would  be  marked  by  a  special  symbol,  freeing  these  identifiers  for  the  pro- 
grammer's use.  The  inclusion  of  a  macro  facility  gives  the  programmer  the 
power  to  extend  the  basic  language,  or  to  make  one  language  resemble  another, 
or  to  make  his  source  code  appear  more  readable.  For  example,  if  the  compiler 
for  an  Algol -like  language  were  written  using  the  TWS,  one  could  extend  the 
language  at  compile  time  by  adding  appropriate  macro  definitions  to  a  program 
written  in  it  to  make  it  resemble  COBOL: 

DEFINE  ADDING  =  MEND, 
TO  =  +  MEND, 
COMPUTE  =  MEND, 
BY  =  :=  MEND  ; 

where  MEND  terminates  a  macro  definition.  Then  the  source  language  statement: 

COMPUTE  XYZ  BY  ADDING  A  TO  B; 


would  be  compiled  as: 

XYZ  :=  A+B 

since  ADDING  and  COMPUTE  were  both  defined  to  be  null. 

Thus,  to  reiterate,  the  main  functions  of  this  lexical  scanner  are 
to  assemble  the  terminal  symbols  from  the  source  string,  to  pass  simple  repre- 
sentations of  those  symbols  to  the  syntactic  recognizer,  to  maintain  the  BCD 
symbol  table,  and  to  perform  macro  expansion.  The  data  structures  behind  these 
functions  are  the  subject  of  Chapter  2,  and  a  functional  description  of  these 
functions  in  terms  of  the  Algol  procedures  that  implement  them  are  the  subject 
of  Chapter  3. 


2.  SCANNER  DATA  STRUCTURES 

2.1  General  Considerations 

The  structure  of  the  internal  tables  and  the  algorithms  to  use  them 
will  have  a  great  effect  on  the  speed  and  efficiency  of  any  program.  The 
lexical  scanner  is  one  of  the  most  used  procedures  in  any  compiler,  and 
attention  must  be  paid  to  make  it  as  efficient  as  possible.  In  the  imple- 
mentation described  here,  one  of  the  main  considerations  is  the  structure  of 
the  language  into  which  the  compilers  are  translated,  Burroughs  extended 
Algol  for  the  B-5500.  A  brief  introduction  to  the  B-5500  and  its  constraints 
on  the  Algol  language  are  appropriate  here. 

The  B-5500  [8]  is  a  multiprogramming,  multiprocessor  computer 
system.  With  a  limited  (32K  48-bit  words)  main  memory,  it  relies  heavily  on 
segmentation  of  both  programs  and  data  to  make  most  effective  use  of  limited 
memory  to  service  the  various  programs  in  the  mix.  Specifically,  programs 
and  arrays  are  broken  down  into  segments,  each  no  larger  than  1024  words. 
The  program  segments  are  stored  on  the  disk.  When  a  program  enters  the  mix 
to  be  run,  it  is  assigned  a  fixed,  non-over! ayable,  contiguous  area  for  a 
run-time  stack  and  program  reference  table.  This  latter  contains  storage  for 
single  variables  and  descriptors  relating  to  each  program  and  array  segment. 
Then,  program  and  data  segments  are  read  off  the  disk  as  they  are  needed,  and 
assigned  space  in  core  possibly  overlaying  previous  information  from  any  of 
the  programs  in  the  mix.  If  the  area  being  overlaid  contains  only  program 
segments,  or  array  segments  that  have  not  been  written  into,  it  is  simply 
overwritten;  the  information  is  still  on  the  disk.  However,  an  area  containing 
array  segments  with  words  that  have  been  changed  causes  those  segments  to  be 
written  back  onto  the  disk  before  being  overlaid. 


The  restriction  that  array  segments  be  no  longer  than  1024  words  is 
of  primary  interest  here.  The  segmentation  is  by  array  rows  -  with  each  row 
occupying  a  segment.  Thus,  no  row  may  be  longer  than  1024  words.  A  one- 
dimensional  linear  array  is  one  row,  so  no  linear  array  may  be  longer  than 
1024  words.  Larger  linear  tables  must  be  simulated  as  two-dimensional  arrays 
For  instance,  an  8192  word  table  could  be  declared  with  array  bounds  [0:15, 
0:511]  so  there  would  be  sixteen  segments  each  containing  512  words.  When 
simulating  large  linear  arrays,  it  is  wise  to  express  the  range  for  each  sub- 
script to  be  a  power  of  two  in  each  case  in  order  to  be  able  to  access  an 
entry  in  the  table  using  a  single  index.  In  the  case  above,  the  column  sub- 
script requires  exactly  four  bits,  whereas  the  row  subscript  requires  exactly 
nine.  Thus,  given  a  single  48-bit  index  I,  I. [35:4]  would  select  the  proper 
row,  whereas  I. [39:9]  would  select  the  proper  column  position  within  that  row 
(in  the  partial  word  notation  of  Burroughs  extended  Algol). 

2.2  BIGTAB  -  The  Symbol  Table 

In  any  compiler,  storage  for  the  representations  of  the  terminal 
symbols  must  be  made.  The  efficiency  of  the  compiler  can  be  greatly  affected 
by  the  choice  of  data  structure  for  the  symbol  table.  The  specific  functions 
that  must  be  optimized  in  the  use  of  the  symbol  table  are,  in  order  of  im- 
portance: 

a)  lookup 

b)  insertion 

c)  traversing. 

Furthermore,  separate  lists  must  be  maintained  for  the  four  classes  of  multi- 
character terminal  symbols:  <*!>,  <*N>,  <*S>,  and  <*R>. 


The  basic  structure  BIGTAB  was  chosen  so  as  to  store  data  as  a 
forest  of  binary  trees,  having  four  trees  interleaved  within  one  8192-word 
table.  The  advantages  of  this  approach  are: 

a)  being  naturally  linked  lists,  interleaving  the  trees  in 
the  same  table  is  possible;  thus,  an  identifier  entry  could 
be  adjacent  in  the  table  to  a  numeric  literal;  space  has  to 
be  reserved  for  only  one  table; 

b)  lookup  is  fast  compared  to  a  linearly  linked  list; 

c)  insertion  can  be  made  in  the  next  sequential  location  in  the 
table  with  no  need  to  change  links  already  established. 

Knuth  [6]  discusses  extensively  the  characteristics  of  binary  tree  structures. 

2.2.1  Basic  Format 

Each  tree  has  a  head  node  in  a  fixed  table  location:  BIGTABp]  for 
identifiers,  BIGTAB[2]  for  numeric  literals,  BIGTAB[3]  for  string  literals  and 
BIGTAB[15]  for  key  words.  This  head  node  is  a  pointer  to  the  root  of  its  tree. 
Each  BIGTAB  entry  consists  of  an  entry  head  plus  one  to  eight  data  words  to 
store  the  BCD  characters  of  the  text. 


HEAD  NODE 


ENTRY 
HEAD 


Semantic 
Part 


0:16 


Number  of 
Characters 


16:6 


Left  Pointer  |  Right  Pointer 


22:13 


35:13 


DATA 
WORDS 


0 

_________ 

0:12 


BCD  CHARACTERS 


Figure  1.  BIGTAB  Basic  Format 
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2.2.2  Storage  for  Keywords,  Identifiers  and  String  Literals 

The  syntax  preprocessor  will  extract  from  the  syntax  definition  the 
language  specific  keywords  and  non-terminals  and  place  them,  linked  together 
in  one  tree  in  BIGTAB  format,  into  a  disc  file  called  "<LANGNAME>/TABLES", 
where  <LANGNAME>  is  the  name  assigned  by  the  compiler  designer.  At  the  start 
of  every   execution  of  the  TWS-built  compiler,  this  file  is  read,  initializing 
the  run-time  BIGTAB.  The  language  non-terminals  (such  as  <BL0CK>,  or 
ARITHMETIC  PRIMARY>)  will  allow  the  table-driven  parser  to  provide  a  trace 
of  the  parsing  path.  The  non-terminals  are  inserted  into  the  table  with  a 
leading  blank  so  they  will  never  be  recognized  as  identifiers  in  the  source 
string.  Appendix  B  lists  the  TBNF  syntax  of  a  simple  language  DEMALGOL,  and 
gives  an  example  of  the  initial  BIGTAB  produced  by  this  syntax. 

The  scanner  assumes  as  the  nominal  condition  that  these  keywords 
will  always  be  preceded  by  the  special  symbol  "#"  (i.e.,  #BEGIN),  and  that 
BIGTAB[15]  will  point  to  this  initial  syntax-preprocessor-built  tree.  Thus, 
all  occurences  of  "#"  followed  by  an  identifier  will  cause  reference  to  this 
tree.  But,  by  control  card  option,  the  nominal  condition  can  be  replaced  by 
a  reserved  word  option.  In  this  condition,  all  occurences  in  the  source 
string  of  all  syntax-defined  keyword  identifiers  (i.e.,  BEGIN)  will  be  reserved 
to  have  only  the  keyword  meaning,  and  BIGTAB[1],  the  identifier  tree,  is  set 
to  point  to  the  syntax  preprocessor-built  tree.  Thereafter,  all  identifiers 
in  the  source  string  will  be  checked  against  this  table,  and  newly-defined 
identifiers  will  be  linked  into  it.  The  scanner  will  recognize  the  presence 
of  a  reserved  identifier  by  the  fact  that  the  BIGTAB  address  is  within  the 
range  of  the  initial  table. 

For  keywords,  identifiers  and  string  literals,  the  basic  format  is 
exactly  as  specified  in  section  2.2.1.  The  only  difference  among  the  three 


classes  is  in  the  use  of  the  semantic  part.  The  BCD  characters  are  stored 
six  per  48-bit  word,  allowing  a  maximum  of  48  significant  characters. 

For  identifiers,  headword  bit  [1:1]  is  reserved  by  the  scanner  to 
indicate  this  identifier  is  defined  as  a  macro  or  macro  formal  parameter.  If 
set,  then  bits  [4:12]  point  to  the  address  in  MACROTAB  of  the  stored  text, 
and  bits  [2:1]  indicate  a  formal  parameter.  If  the  identifier  is  not  defined 
as  a  macro,  bits  [2:13]  may  be  set  by  the  compiler  semantic  routines  as  de- 
sired. In  GLYPNIR  [2],  as  implemented  using  the  TWS,  pointers  to  the  semantic 
IDTAB  and  the  parser  MSTACK  are  inserted  in  the  semantic  part. 

For  keywords,  the  syntax  preprocessor  places  in  the  semantic  part  a 
unique  symbol  number  for  each  keyword  in  the  syntax.  This  allows  the  parser 
to  consider  the  keyword  as  it  would  a  single  special  symbol. 

For  string  literals,  the  semantic  part  is  reserved  for  the  semantic 
routines,  typically  to  point  to  a  literal  table. 

2.2.3  Storage  for  Numeric  Literals 

A  numeric  literal  is  a  string  of  characters  that  carries  an  inherent 
semantic  value  -  the  specific  quantity  that  this  string  represents.  As  this 
semantic  value  will  be  variable  and  machine  dependent,  the  TWS  will  not  convert 
these  literals  to  an  internal  machine  representation,  but  rather  transform  the 
literal  string  to  a  normalized,  consistent  BIGTAB  entry,  with  enough  analysis 
performed  on  the  source  string  to  make  the  semantic  conversion  of  the  numeric 
literal  to  internal  representation  relatively  straightforward  for  the  semantic 
part  of  the  compiler. 

In  BIGTAB,  the  same  basic  header  word  and  data  word  structure  applies 
here  as  in  the  identifier,  keyword  and  string  literal  tables.  The  semantic 
part  of  the  header  word  can  be  used,  as  in  the  TRANQUIL  compiler  implemented 
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using  the  TWS,  to  store  a  literal  type,  and  a  pointer  to  a  semantic  table. 
But,  in  the  data  words,  quite  a  different  structure  is  used.  Instead  of  the 
BCD  characters  packed  six  per  48-bit  word,  the  full  eight  character  capacity 
of  each  word  is  used,  with  the  first  two  characters  in  the  first  data  word 
being  used  to  describe  certain  semantic  attributes  of  the  numeric  literal: 


ENTRY 
HEAD: 

Semantic 
Part 

Number  of 
Characters 

Left  Tree 
Pointer 

Right  Tree 
Pointer 

0:16      16:6      22:13 

35:13 

DATA 
WORDS: 

Char 
0 

Char 
1 

Char 
2 

Char 
3 

Char 
4 

Char 
5 

Char 
6 

Char 
7 

(1-8) 

0:6 

6:6 

12:6 

18:6 

24:6 

30:6 

36: 

6 

42:6 

The  first  two  characters  (12  bits)  of  the  first  data  word  have  the 
following  values: 

0:1  -  Unused,  always  zero 
1:1  -  Base  indicator 

=0  Decimal  numeral,  base  10 

=1  Nondecimal  numeral ,  base  2  to  36 
2:1  -  Numeric  type 

=0  Integer 

=1  Real 
3:1  -  Sign  of  exponent 

=0  Positive 

=1  Negative 
4:2  -  Number  of  exponent  digits  (I) 

0-3  (i.e.,  exponent  0-999, Q) 

6:6  -  Number  of  mantissa  digits  (N). 
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There  follow  "N"  characters  of  the  mantissa;  followed  by  "I"  by 
characters  of  the  exponent  (for  real  type  numeric  literals  only);  followed  by 
one  character  containing  the  base  (for  nondecimal  base  numeric  literals  only), 
range  2-36^.  The  last  data  word  is  zero  filled. 

For  numeric  literals  containing  a  radix  point,  the  mantissa  is 
normalized,  that  is  the  exponent  is  recomputed  as  though  the  radix  point  is 
to  the  right  of  the  rightmost  mantissa  digit. 

For  nondecimal  bases,  there  must  be  provision  for  up  to  36  different 
digits.  The  scanner  considers  0-9  and  A-Z  as  the  36  digits.  The  internal 
character  code  for  the  decimal  digits  0-9  exactly  correspond  to  the  "digit 
value"  0-9.  But  this  is  not  true  for  the  alphabetic  letters.  To  correct  for 
this,  the  input  alphabetic  character  will  be  converted  to  a  true  digit  value 
in  the  range  0-35  for  storage  in  the  data  words.  This  is  accomplished  by 
subtracting  a  bias  from  the  character  code,  depending  on  the  letter: 

A-I  subtract  7 
J-R  subtract  14 
S-Z  subtract  22. 

Consider  as  an  example,  the  hexadecimal  numeric  literal 
3A42E. 5690-354(1 6).  The  semantic  descriptor  would  be  composed  as  follows: 

0:1  =0 

1:1  =1 ,  nondecimal  base 

2:1  =1,  real 

3:1  =1,  negative  exponent 

4:2  =3,  3  digits  of  exponent 

6:6  =8,  8  digits  of  mantissa 

This  produces  for  the  first  12  bits  0111  111  001  000,  or  as  six  bit  characters, 
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V8".  This  will  produce  the  following  data  words: 

Data  word  1:     *■     8     3     #     4     2     >     5 
Data  word  2:     69357+00 

Note  that  the  exponent  has  been  changed  from  354  to  357,  repre- 
senting the  normalization;  that  the  base  is  represented  by  "+",  or  16, Q;  that 

the  second  word  is  padded  with  zeros;  that  hex  A  (character  code  17)  is 
converted  into  "#"  (character  code  10);  and  that  hex  E  (character  code  21)  is 
converted  into  ">"  (character  code  14). 

The  simple  decimal  integer  1  would  be  converted  for  storage  to: 

0:1  =0 

1:1  =0,  decimal  base 

2:1  =0,  integer 

3:1  =0,  positive  exponent 

4:2  =0,  no  exponent  part 

6:6  =1,  1  mantissa  digit. 

This  produces  the  six  bit  characters  "01",  and  the  following  data  word: 

Data  word  1:  0  110  0  0  0  0 

2.3  The  SCAN  Descriptor 

The  ALPHA  procedure  SCAN  is  called  by  the  parser  (and  recursively 
from  within  the  scanner  itself)  when  a  new  terminal  symbol  is  required.  The 
48-bit  value  assigned  to  SCAN  as  a  function  is  referred  to  as  the  SCAN  de- 
scriptor, and  has  the  format  as  shown  in  Figure  2. 

For  keywords  <*R>,  the  symbol  number  is  assigned  by  the  syntax  pre- 
processor, starting  with  66, «.  Special  single  characters  have  a  symbol  number 

equal  to  their  internal  6-bit  character  code,  thus  varying  from  0,0(numeral 
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zero)  to  63, 0(").  This  allows  the  keywords  and  the  special  single  characters 
to  be  considered  the  same  in  the  parsing  routines. 

<*I>,  Identifiers: 


Unused 


Class 
=1 


BIGTAB 
Semantic  Part 


0:2    2:4 


<*N>  ,  Numeric  Literals 


6:12 


Pointer  to 
BIGTAB 


18:13 


Class 
=1 


Pointer  to 
BIGTAB 


31:4 


35:13 


Unused 


0:2 


Class 
=2 


BIGTAB 
Semantic  Part 


Pointer  to 
BIGTAB 


2:4 


6:12 


18:13 


Class 
=2 


Pointer  to 
BIGTAB 


31:4 


35:13 


<*S> ,  String  Literals 


Unused 


Class 
=3 


BIGTAB 
Semantic  Part 


0:2 


2:4 


6:12 


Pointer  to 
BIGTAB 


Class 
=3 


18:13 


Pointer  to 
BIGTAB 


31:4 


35:13 


:*R>,  Keywords: 


Unused 

Class 
=  15 

Symbol 
Number 

Pointer  to 
BIGTAB 

Class 
=  15 

Symbol 
Number 

0:2 


2:4 


6:12 


18713 


31:4 


35:13 


Special  Single  Characters: 


Unused 

Class 
=  15 

Symbol 
Number 

Symbol 
Number 

Class 
=  15 

Symbol 
Number 

0:2 

2:4 

6:12 

18:13 

31:14 

35:15 

Figure  2.  Format  of  the  SCAN  Descriptors 


For  keywords  <*R>,  the  symbol  number  is  assigned  by  the  syntax  pre- 
processor, starting  with  66, Q.  Special  single  characters  have  a  symbol  number 

equal  to  their  internal  6-bit  character  code,  thus  varying  from  0,Q(numeral 
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zero)  to  63,0(").  This  allows  the  keywords  and  the  special  single  characters 
to  be  considered  the  same  in  the  parsing  routines. 

2.4  MACROTAB  -  The  Macro  Text  Table 

2.4.1  Concept  of  a  Definitional  Facility 

Many  compilers  in  current  use  (B-5500  ALGOL,  IBM  PL/I,  JOVIAL,  etc.) 
have  a  definition  facility- -that  is  capability  to  define  compile-time  procedure- 
like constructs.  One  can  compare  a  text- type  definition  or  macro  facility  with 
a  run-time  procedure  construct  as  follows: 

A  procedure 

a)  is  considered  syntactically  as  a  complete  <statement>  or 
<primary>; 

b)  produces  one  set  of  machine  code  that  may  be  executed  by 
jumps  and  parameter  linkages  from  different  parts  of  the 
main  program. 

A  macro 

a)  may  be  an  incomplete  syntactic  fragment  composed  of  a 
sequence  of  terminal  symbols; 

b)  produces  a  separate  set  of  machine  code  for  each  invocation; 

c)  is  strictly  a  compile-time  device  that  is  transparent  to  the 
parsing  and  semantic  portions  of  the  compiler. 

2.4.2  The  TWS  Macro  Facility 

As  the  TWS  was  developed  using  Burroughs  B-5500  ALGOL,  it  became 
apparent  that  the  definition  facility  implemented  on  this  compiler  made  the 
compiler  easier  to  use,  and  actually  allowed  local  extensions  to  be  implemented 
in  a  rather  straightforward  manner.  Therefore,  as  a  practical  matter  a  similar 
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parameterized  macro  expander  was  included  as  a  part  of  the  core  compiler  for 
all  TWS-written  compilers. 

The  storage  scheme  selected  was  to  store  the  macro  text  as  the  entire 
SCAN  descriptor,  with  two  header  words  for  each  definition.  Some  elements  of 
the  storage  scheme  are: 

a)  one  48-bit  word  per  terminal  symbol; 

b)  the  scope  of  identifiers  (a  semantic  concept)  used  in  the 
macro  text  will  be  defined  at  the  point  in  the  program  where 
the  macro  is  declared,  since  the  SCAN  descriptor  includes  the 
BIGTAB  semantic  part  at  the  time  the  macro  was  declared. 

c)  accessing  the  pre-stored  macro  text  by  the  scanner  may  be 
faster  than  scanning  the  text  from  the  source  string—as 
the  time-consuming  assembling  of  the  characters  into  the 
numeric  strings  and  identifier  strings,  and  table  lookup  in 
BIGTAB  is  performed  only  once,  no  matter  how  many  times  the 
macro  is  invoked; 

d)  block  structure  considerations  are  made  to  allow  an  iden- 
tifier to  be  defined,  for  example,  as  a  label  in  one  block 
and  redefined  as  a  macro  in  an  inner  block  with  the  old 
semantic  definition  being  restored  upon  block  exit; 

e)  a  defined  identifier  may  be  redefined  within  the  block  in 
which  it  was  declared,  in  which  case  the  new  text  will  re- 
place the  old  text  for  subsequent  invocation  (this  implies 
that  the  macro  declaration  does  not  necessarily  have  to  be 
placed  in  the  block  head  for  a  block-structured  language), 

f)  no  parsing  or  syntax  checking  of  the  text  is  made  until  the 
defined  identifier  is  invoked; 
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g)  defined  identifiers  (i.e.,  calls  on  other  macros)  may  occur 
anywhere  within  the  macro  text,  but  the  value  is  defined  at 
the  point  of  the  macro  declaration; 
h)  defined  identifiers  may  occur  anywhere  within  the  actual 

parameter  part  of  any  macro  call; 
i)  no  recursion--i .e. ,  one  macro  directly  or  indirectly  calling 
itself-- is  permitted. 
Appendix  A  describes  the  detail  of  the  syntax  and  semantics  of  the 
elements  of  the  macro  facility. 

2.4.3  MACROTAB  Descriptors 

As  the  macro  text  is  processed  by  the  scanner  from  the  DEFINE  dec- 
laration, two  header  words  are  set  up  in  MACROTAB: 


W0R.D1 : 


Unused 


0:6 


Address  of 
Return  Descriptor 


Address  of  Actual 
Parameter  Table 


6:12 


Number  of 
Parameters 


18:12 


30:12 


Unused 


42:6 


W0RD2: 


BIGTAB  Semantic 

Pointer  to  BIGTAB 

Link  to 

Block 

Part  of  Defined 

Address  of  Defined 

Previous 

Nesting 

Identifier 

Identifier 

MACROTAB 

entry 

Level 

0:16 


76TT3 


29:12 


4T77 


The  text  is  then  scanned  (by  SCAN)  into  the  table,  one  word  per  termi- 
nal symbol.  If  a  defined  identifier  is  encountered  in  the  source  string,  a 
special  macro  call  descriptor  is  inserted  into  the  table.  If  a  formal  parameter 
is  encountered,  a  special  formal  parameter  descriptor  is  inserted.  Finally,  at 
the  end  of  the  text,  a  return  descriptor  is  inserted: 


17 


SCAN 
DESCRIPTOR: 


Unused 


Class 


0:2 


Symbol  # 


2:4 


6:12 


BIGTAB  Pointer  for 
<*I>  <*N>  <*S> 


18:13 


Class  I  BIGTAB 
Pointer 
31:4       35:13 


MACRO  CALL 

Unused 

Class 

Pointer  to 

Address  of  Where 

Contents  of 

DESCRIPTOR: 

=8 

Called  Macro 

to  Continue  after 
the  Call 

Called  Macro's 
Return  Word 

0:2 


2:4 


6:12 


18:12 


30:12 


RETURN 
DESCRIPTOR: 


Unused 


0:2 


Class 
=9 


Where  to  Continue  Processing 
upon  Return;  =0  Means 
Outermost  Macro 


Address  of  Macro 
Call  Descriptor 


2:4 


6:12 


18:12 


FORMAL 

PARAMETER 

DESCRIPTOR: 


Unused 

Class 
=  10 

Address  of  Macro 
Header  Word 

Parameter 
Number 

0:2 


2:4 


6:12 


18:12 


When  the  macro  is  invoked,  the  actual  parameters  must  be  stored,  in  a 
manner  similar  to  the  macro  itself,  as  scan  descriptors.  In  addition  to  the 
stored  text,  for  each  actual  parameter,  there  will  be  one  return  descriptor  as 
described  above  plus  one  parameter  address  and  length  word  for  each  two  actual 
parameters. 


ADDRESSES 
AND  LENGTH 
DESCRIPTOR: 


First 
Length 

First 
Address 

Second 
Length 

Second 
Address 

0:12 


12:12 


24:12       36:12 


As  the  formal  parameters  used  within  the  macro  definitions  are  strictly 
local,  provision  has  been  made  to  use  the  high-order  end  of  the  macro  table 
(from  location  4095  down)  as  temporary  storage  for  the  semantic  part  of  the 
parameter  identifiers  during  scanning  of  the  macro  text.  This  semantic  part  is 
then  restored  when  the  mend  terminating  the  definition  is  scanned: 


FORMAL 
PARAMETER 
SAVE  WORDS 


BIGTAB 
Semantic  Part 


BIGTAB 
Pointer 


0:16 


16:13 
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2.5  CHARCLASS  -  The  Character  Class  Table 

The  term  "terminal  head  symbol"  refers  to  the  first  character  of  a 
terminal  symbol.  The  scanner  needed  a  way  to  determine  from  the  terminal  head 
symbol  what  class  of  terminal  symbol  was  to  follow.  For  example,  a  decimal 
digit,  radix  point  or  exponent  sign  will  indicate  a  numeric  literal  must  be 
formed  from  the  following  characters.  Similarly,  a  string  quote  indicates  a 
string  literal  follows.  For  this  and  other  decision  points  in  the  scanner,  a 
table  of  character  classes  has  been  established,  assigning  to  each  six-bit  BCD 
character  a  bit  string: 


Character 
Class 


Class 
Value 


CHARCLASS  Bit  Positions 
41  42  43  44  45  46  47 


Digits  0-9  58 

Special  Keyword 

Delimiter  (  £  )  4 

Numeric  Literal 

Exponent  Delimiter  (  £  )  18 

Radix  Point  (  ^  )  34 

Numeric  Literal 

Base  Delimiter  [  {_)  64 

String  Quote  (  ^  )  3 

All  Other 

Special  Symbols  0 

Letters  A-I  89 

Letters  J-R  105 

Letters  S-Z  121 


0   1110   10 

0   0   0   0   10   0 

0   0   10   0   10 
0   10   0   0   10 

10   0   0   0   0   0 
0   0   0   0   0   1   1 


0 

0 

0 

0 

0 

0 

0 

1 

0 

1 

1 

0 

0 

1 

1 

1 

0 

1 

0 

0 

1 

1 

1 

1 

1 

0 

0 

1 

Table  1.  CHARCLASS  Table 


When  certain  lexical  decisions  must  be  made  about  a  character  in  the 
source  string,  a  branch  is  made,  indexed  by  some  subset  of  the  bits  in  the 
CHARCLASS  entry. 
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When  the  scanner  is  ready  to  search  for  a  new  terminal  symbol,  a 
branch  is  made  on  bits  [45:3]  of  the  CHARCLASS  of  the  terminal  head  symbol, 
with  the  following  results: 

[45:3]  Value  Characters  in  This  Class 

1  Letters  A-Z.  Identifier  follows. 

2  @_,  _;_,  Digit  0-9.  Numeric  literal  follows. 

3  \     String  literal  follows. 

4  #_.  If  the  special  word  option  for  keywords 
was  chosen,  a  language  keyword  follows. 

0         All  other  special  characters.  Process  as  a 
special  single  character. 

During  the  assembly  of  a  numeric  literal,  when  it  is  known  the 
terminal  head  symbol  has  bits  [45:3]=2,  a  further  branch  is  made  on  bits  [42:2], 
with  the  following  results: 


[42:2] 

Val 

ue 

Characters  in  This  Class 

1 

Q_.     Exponent  delimiter. 

2 

„  Radix  point. 

3 

Digits  0-9. 

These  correspond  to  special  processing  depending  on  the  numeric  type. 

Later,  during  assembly  of  the  interior  symbols  of  the  numeric  literal, 
a  branch  is  made  on  bits  [41:3]  of  the  incoming  source  symbol,  with  the 
following  results: 
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[41 :3]  Value  Characters  in  This  Class 

1  £.  Branch  to  exponent  logic. 

2  _j_.  Branch  to  fractional  digit  logic. 

3  Digits  0-9.  Assemble  in  normal  manner. 

4  {_.     Branch  to  base  logic. 

5  A-I.  Branch  to  logic  to  convert  character 
codes  17-25  to  true  digit  values  10-18. 

6  J-R.  Branch  to  logic  to  convert  character 
codes  33-41  to  true  digit  values  19-27. 

7  S-Z.  Branch  to  logic  to  convert  character 
codes  50-57  to  true  digit  values  28-35. 

0  All  other  special  symbols.  Terminate  numeric 
Literal  processing. 

During  assembly  of  the  alphanumeric  internal  characters  of  an  iden- 
tifier, CHARCLASS  bit  [44:1]  is  used  to  indicate  an  alphanumeric  character. 

[44:1]  Value  Characters  in  This  Class 

1  Digits  0-9,  letters  A-Z.  Assemble  into  the 
identifier. 

0         All  other  characters.  Terminate  identifier 
processing. 

This  procedure  of  using  the  internal  character  code  of  the  source 
characters  to  index  a  table  of  character  classes  allows  the  TWS-supplied  contro' 
card  options  below  to  be  easily  implemented: 
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ALPHABETIC  a 


ALPHANUMERIC  b 


RSWD 


SPWD  c 


EXPONENT  d 


Let  a  have  CHARCLASS.[44:4]=9.  Thus  a  can 

be  an  identifier  terminal  head  symbol 

( [45: 3]=1 )  or  an  internal  identifier  symbol 

([44:13=1). 

Let  b  have  CHARCLASS . [44: 1 ]=1 .  Thus  b  can 

be  an  internal  identifier  symbol. 

Choose  the  reserved  word  option  for  keywords. 

Let  #  have  CHARCLASS  =  zero. 

Retain  the  nominal  special  word  option  for 

keywords,  but  designate  £  to  be  the  delimiter 

by  setting  CHARCLASS  c  to  4,  and  reset 

CHARCLASS  £  to  zero. 

Change  CHARCLASS  d  to  18;  reset  CHARCLASS  £ 

to  zero. 


Specifying  "ALPHANUMERIC  -  "  on  a  control  card  would  allow  a  COBOL- 
like  identifier  CASH-IN-FIRST-NATIONAL-BANK. 
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3.  MAIN  SCANNER  PROCEDURES 

This  chapter  will  give  a  functional  description  of  major  parts  of 
the  lexical  scanner  in  terms  of  the  ALGOL  procedures  that  perform  the  various 
functions. 

3.1   READACARD 

This  procedure  defines  the  basic  format  of  the  source  program  cards 
accepted  by  the  TWS  scanner. 

a)  The  source  program  card  images  are  read  from  the  disc  file 
"SOURCE"  as  80-character  records,  into  an  array  CARDBUF[0] 
to  CARDBUF[9],  appropriately  recognizing  the  end-of-file 
condition. 

b)  The  text  in  card  image  columns  1-72  is  then  transferred  into 
another  array  CHARBUF[1]  to  CHARBUF[72],  one  BCD  character 
per  word. 

c)  The  card  image  is  analyzed  to  identify  leading  and  trailing 
blanks,  setting  items  "FCR"  as  the  column  with  the  first  non- 
blank  character,  "LCR"  as  the  last  non-blank  character,  and 
"NCR"  as  the  moving  character  pointer  initially  set  equal  to 
"FCR". 

d)  The  card  image  counter  "CARDCOUNT"  is  incremented  by  one, 
with  the  current  value  placed  in  CARDBUF[10]  for  printing  as 
columns  81-89  of  the  card  image. 

e)  The  number  in  columns  72-80  of  the  card  image  is  translated 
to  internal  form,  and  made  available  to  the  rest  of  the 
compiler  procedures  as  "CARDNUM".  If  this  field  is  blank  on 
the  first  card  image  in  the  source  stream,  "CARDNUM"  will  be 
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subsequently  set  equal  to  "CARDCOUNT"  for  each  card  image. 

f)  If  column  one  is  "$",  the  card  image  is  printed  on  printer 
backup  disc  file  "LINE",  and  control  is  passed  to  the  pro- 
cedure "CONTROLCARD"  for  analysis  of  the  control  card 
information. 

g)  If  the  card  image  was  not  a  control  card,  and  if  the  "PRINT" 
or  "LIST"  control  card  options  had  been  chosen  previously, 
the  contents  of  array  CARDBUF,  including  the  inserted  card 
counter,  is  printed  on  file  "LINE". 

Thus,  externally,  the  following  are  the  user-oriented  TWS  features 
implemented  through  READACARD: 

a)  control  cards  start  with  a  "$"  in  column  one; 

b)  source  text  occupies  columns  1-72; 

c)  columns  72-80  may  contain  a  card  count,  that  will  be  made 
available,  if  the  semantic  routines  store  it,  for  program 
traces. 

3.2  NEXTCHAR 

This  procedure  provides  other  scanner  routines  with  text  one  charac- 
ter at  a  time.  Its  output  is  a  variable  NXTCHR  containing  the  six-bit  BCD  code 
of  the  character  scanned  in  the  source  string.  In  addition,  the  following  func- 
tions are  performed: 

a)  If  a  "%"  is  detected  anywhere  on  the  card  image,  the  rest  of 
the  card  image  is  ignored.  This  is  the  basic  COMMENT  capa- 
bility provided  by  the  TWS.  An  ALGOL-like  facility  may  be 
implemented  by  the  semantics,  with  the  caveat  that  all  the 
text  in  the  comment  would  be  scanned,  with  all  words  and 
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numbers  placed  into  BIGTAB. 
b)  When  not  internal  to  a  string  literal,  multiple  blanks  in 
the  source  stream  are  reduced  to  a  single  blank  for  parsing 
purposes. 

3.3  Boolean  Procedure  TABLESEARCH 

Prior  to  the  execution  of  TABLESEARCH,  other  procedures  will  have 
been  run  to  assemble  from  the  source  string  one  of  the  classes  <*I>,  <*R>, 
<*S>,  or  <*N>  of  multi -character  terminal  symbols  into  SYMBUF[0]  to  SYMBUF[7], 
in  the  format  of  the  BIGTAB  data  words  described  in  section  2.2.  The  value  of 
the  procedure  will  be  TRUE  if  a  successful  BIGTAB  table  lookup  has  been  made, 
meanwhile  setting  the  global  variable  NEXTSYM  to  be  the  SCAN  descriptor  (see 
section  3.2).  The  value  is  set  to  FALSE  if  a  macro  definition  or  call  is  en- 
countered, indicating  the  main  SCAN  procedure  must  then  either  assemble  further 
text  from  the  source  string,  or  obtain  SCAN  descriptors  from  the  macro  table. 

The  TABLESEARCH  procedure  has  three  main  portions  that  will  be  de- 
scribed in  detail:  table  lookup  in  BIGTAB,  processing  of  an  entry  already  in 
BIGTAB,  and  insertion  of  a  new  entry  into  BIGTAB. 

3.3.1  Table  Lookup 

BIGTAB  is  a  straightforward  binary  tree  structure.  The  basic  algo- 
rithm below  does  not  reflect  the  complication  in  the  scanner  that  is  required 
by  the  fact  that  entries  to  be  compared  may  be  of  different  lengths,  extending 
over  one  to  eight  words. 
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Given    SYMBUF[0]  to  SYMBUF[7]  containing  an  entry  in  BIGTAB  format; 
TYP,  the  terminal  symbol  class  (1,  2,  3,  15)  of  the  entry; 
NWDCH,  the  length  of  the  SYMBUF  entry; 
LEFTPOINTER  and  RIGHTPOINTER  referring  to  pointers  in  the 
BIGTAB  entry  head 


LI (Find  root) 
L2(Test  if  link  null) 
L3(Compare  SYMBUF 

with  BIGTAB) 
L4(SYMBUF  f   BIGTAB) 


L5(SYMBUF  >  BIGTAB) 

EXIT! 

EXIT2 


Set  ENTRYPTR  +   BIGTAB[TYP] 

Is  ENTRYPTR  =  0?  Yes,  go  to  EXIT1. 

If  SYMBUF  =  BIGTAB  entry,  and  length  of  SYMBUF 

length  of  BIGTAB  entry,  then  go  to  EXIT2. 

If  SYMBUF  <  BIGTAB  entry, 

set  ENTRYPTR  *■   BIGTAB[ENTRYPTR].LEFTPTR, 

set  K  «■  1,  go  to  L2. 

Set  ENTRYPTR  «-  BIGTAB[ENTRYPTR]. RIGHTPOINTER, 

set  K  ^  2,  go  to  L2. 

Entry  not  in  BIGTAB.  See  section  3.3.3. 

Entry  already  in  BIGTAB,  ENTRYPTR  is  location 

of  head  node.  See  section  3.3.2. 


Figure  3.  Table  Lookup  Algorithm 

3.3.2  Processing  of  an  Entry  Already  in  BIGTAB 

If  the  symbol  scanned  in  the  source  string  is  already  in  BIGTAB,  three 
cases  must  be  distinguished: 

a)  The  keyword  DEFINE  is  encountered.  A  macro  definition 

follows.  Process  the  text  into  MACROTAB  format  (see  section 
2.4).  Exit  TABLESEARCH,  indicating  an  unsuccessful  table 
lookup. 
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b)  A  defined  identifier  indicating  a  macro  call  is  encountered. 
Process  the  actual  parameters  of  the  call  (see  section  2.4) 
if  any,  change  the  scan  mode  to  indicate  subsequent  scan 
descriptors  are  to  come  from  the  macro  table.  Exit 
TABLESEARCH,  indicating  an  unsuccessful  table  lookup. 

c)  A  normal  entry  is  encountered.  Build  the  scan  descriptor 
NEXTSYM  according  to  the  symbol  class.  Exit  TABLESEARCH, 
indicating  successful  table  lookup. 

3.3.3  Insertion  of  a  New  Entry  Into  BIGTAB 

When  the  symbol  scanned  in  the  source  string  is  not  found  in  the 
BIGTAB  table  lookup,  it  must  be  then  inserted  into  the  symbol  table: 

a)  The  head  word  is  created,  inserting  only  the  number  of 
characters.  The  semantic  routines  will  set  the  semantic 
part,  and  both  right  and  left  tree  links  will  be  empty  when 
the  entry  is  created. 

b)  The  head  word  and  the  data  word(s)  are  inserted  into  the 
table  in  the  next  available  sequential  location.  A  check 
is  made  to  insure  that  the  complete  entry  will  fit  into  one 
array  row,  as  to  split  elements  of  one  symbol  across  array 
rows  would  cause  undue  overhead  due  to  array  row  segmenta- 
tion in  the  B-5500. 

c)  It  was  found  that  the  entry  was  not  in  BIGTAB  when  either 
the  right  or  the  left  of  the  entry  head  in  location 
ENTRYPOINTER  was  null.  If  in  L4,  K  was  set  to  one,  then 
the  left  pointer  of  the  entry  head  at  location  ENTRYPOINTER 
must  be  set  to  point  to  this  new  entry  address.  Otherwise, 
in  L5,  K  was  set  to  two  so  the  right  pointer  must  be  set  to 
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point  to  the  new  entry, 
d)  Build  the  scan  descriptor  NEXTSYM  according  to  the  symbol 
class.  Exit  TABLESEARCH,  indicating  a  successful  table 
lookup. 

3.4  Procedures  to  Build  BIGTAB  Entries 

There  exist  three  major  procedures,  NUMERICLIT,  STRINGET,  and 
ALPHAGET  to  assemble  into  SYMBUF  0  to  SYMBUF  7  the  numeric  literal,  string 
literal  and  identifier  data  types.  They  are  functionally  described  by  their 
outputs  in  section  2.2. 

3.5  The  Macro  Facility 

Section  2.4  describing  the  data  storage  for  the  macro  text  gives  an 
adequate  functional  description  of  the  procedures  PROCESSMACRODECLARATION, 
MACROINVOCATION,  and  PROCESSMACROACTUALPARAMETERPART.  This  section  will  dis- 
cuss the  procedure  GETDESCRIPTORFROMMACROTAB,  to  illustrate  how  it  "executes" 
the  descriptors  placed  in  the  macro  table  by  the  other  procedures. 

The  major  concept  in  designing  this  descriptor-based  macro  system 
that  would  allow  nearly  arbitrary  text  in  the  parameters  of  the  macro  invoca- 
tion, and  that  would  further  allow  arbitrary  (except  recursive)  macro  invoca- 
tions either  within  the  text  or  within  the  actual  parameter  was  the  concept  that 
the  descriptors  could  be  considered  as  "instructions"  directing  the  flow  of  data 
from  the  table,  that  would  be  "interpreted"  by  the  Alpha  procedure  GETDESCRIPTOR- 
FROMMACROTAB. If  a  formal  parameter  or  call  on  another  macro  is  detected, 
during  scanning  of  the  text  in  the  declaration,  a  special  "jump  instruction"  is 
placed  in  the  sequential  macro  table  to  direct  the  flow  of  data.  At  the  end  of 
the  macro  text  itself,  and  at  the  end  of  an  actual  parameter,  a  "return"  word 
is  inserted  -  to  direct  the  flow  back  to  the  point  where  it  was  interrupted. 
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The  procedure  is  called  from  the  SCAN  procedure  when  a  new  symbol 
is  needed  in  SCANMODE  5.  The  global  variable  NEXTMACRO  contains  the  macro 
table  address  of  the  next  sequential  entry  to  be  chosen.  The  macro  table 
entry  at  this  address  is  examined  and  "executed",  depending  on  the  class 
field,  bits  [2:4]  of  the  entry: 

Class  Value     Action 

1,  2,  3,  15     Normal  SCAN  descriptor. 

Set  GETDESCRIPTORFROMMACROTAB  +■  MACROTAB  [NEXTMACRO]. 
Increment  NEXTMACRO  by  one.  Exit  procedure. 

8  Macro  invocation  descriptor. 

1)  Set  up  called  macro's  return  descriptor.  Return 
location  is  either  NEXTMACRO  +  1  or  the  address  following 
the  actual  parameter  table.  This  location  has  been  in- 
serted in  bits  [18:12]  of  the  macro  invocation  descriptor 
by  the  procedure  PROCESSMACRODECLARATION. 

2)  Set  in  the  called  macro's  entry  head  word  one  the  ad- 
dress of  the  actual  parameter  table.  If  parameters  are 
present,  their  location  will  be  NEXTMACRO  +  1. 

3)  Set  NEXTMACRO  to  the  first  word  of  the  called  macro's 
text,  located  immediately  following  the  second  header  word. 

4)  Branch  to  code  to  examine  a  new  macro  table  entry. 

9  Return  descriptor. 

Either  a  complete  macro  call  or  an  actual  parameter  has 
been  "executed".  Consider  the  following  two  cases: 
1)  Return  address  is  zero.  This  is  true  only  for  an 
outermost  macro  call.  Set  GETDESCRIPTORFROMMACROTAB  <-   0, 
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Class  Value     Action 

set  SCANMODE  *■  0.     Set  PTMACROTAB  (the  pointer  to  the  next 
available  location  for  insertion  of  next  text)  to  the 
value  it  had  at  the  time  the  defined  identifier  was 
scanned  in  the  source  string.  This  will  "erase"  the 
actual  parameters  stored  for  this  call.  Exit  from  the  pro- 
cedure. 

2)  Return  address  is  not  zero.  Set  NEXTMACRO  «-  return 
address,  branch  to  code  to  examine  a  new  macro  table  entry. 

10  Formal  parameter. 

Extract  from  descriptor  bits  [6:12]  the  address  of  the 
macro  head  word  one.  From  bits  [18:12]  of  the  head  word, 
extract  the  location  of  the  actual  parameter  table  as  set 
when  the  actual  parameters  were  scanned.  From  bits  [18:12] 
of  the  formal  parameter  descriptor,  extract  the  parameter 
number.  Determine  from  the  addresses  and  lengths  de- 
scriptor in  the  actual  parameter  table  the  location  of  the 
specific  actual  parameter  needed,  as  well  as  the  address  of 
its  return  descriptor.  Set  the  actual  parameter  return  ad- 
dress to  NEXTMACRO  +  1 .  Set  NEXTMACRO  «-  actual  parameter 
address.  Branch  to  code  to  examine  a  new  macro  table  entry. 

Note  that  none  of  the  "special"  descriptors  cause  output  from  the  pro- 
cedure, but  just  a  redirecting  of  the  flow,  followed  by  "execution"  of  the  de- 
scriptor in  the  new  location.  Note  also  that  this  procedure  will  work  on  either 
empty  macros  or  empty  parameters.  In  both  cases,  the  stored  text  will  be  simply 
a  return  descriptor. 
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3.6  Alpha  Procedure  SCAN 

SCAN  is  the  procedure  that  controls  the  actions  of  all  the  other  pro- 
cedures mentioned  above.  It  is  the  prime  interface  with  the  parsing  routines, 
and  is  called  when  a  new  terminal  symbol  is  required.  The  procedures  to  build 
the  macro  table  are  declared  in  the  SCAN  procedure  head,  and  thus  call  SCAN  re- 
cursively to  obtain  the  descriptors  to  store  in  the  macro  table. 

The  value  of  SCAN  as  a  function  is  normally  the  scan  descriptor  as 
discussed  at  length  in  section  2.4.  In  SCANMODE  4,  its  value  will  be  the 
contents  of  SYMBUF[0]. 

There  are  several  modes  of  operation  of  SCAN,  depending  on  the  way 
the  source  characters  are  to  be  assembled.  Setting  of  the  global  variable 
SCANMODE  prior  to  call  of  SCAN  will  cause  one  of  the  following  actions  to  be 
taken: 

SCANMODE     Action  Taken  by  the  Scanner 

0  Normal  operational  mode.  Ignore  all  embedded  blanks  out- 
side of  string  literals.  Return  normal  SCAN  descriptors. 

1  As  in  SCANMODE  0,  but  reduce  adjacent  embedded  blanks  to 
one  blank  and  report  as  a  single  special  symbol. 

2  Scan  the  text  between  FCR  and  LCR.  Return  a  descriptor  on 
each  character  in  the  source  string  as  a  single  special 
character  SCAN  descriptor,  but  ignore  blanks. 

3  As  SCANMODE  2,  but  reduce  adjacent  blanks  to  one  and  report. 

4  As  SCANMODE  0,  but  return  contents  of  SYMBUF[0]  -  i.e.,  the 
first  BCD  characters  of  the  terminal  symbol  -  as  the  SCAN 
descriptor.  Do  not  look  up  or  enter  the  symbol  in  BIGTAB. 

5  Fetch  SCAN  descriptor  from  the  macro  table. 
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When  in  SCANMODE  0,  1  or  4,  a  branch  is  made  on  CHARCLASS[45:3]  of 
the  terminal  head  symbol  to  define  whether  an  identifier,  a  numeric  literal, 
a  string  literal,  a  special  word  keyword  or  a  single  special  symbol  follows. 
Based  on  the  specific  branch  made,  the  terminal  symbol  is  assembled  in  the 
proper  format  into  the  array  SYMBUF.  In  SCANMODE  0  or  1 ,  a  BIGTAB  table 
lookup  is  performed,  obtaining  the  BIGTAB  semantic  part  and  address  of  the 
symbol.  With  this  information,  the  SCAN  descriptor  is  assembled. 

3.7  Integer  Procedure  SHAKEOUTBIGTAB 

It  was  noticed  when  working  with  the  initial  BIGTAB  produced  by  the 
1969  version  of  TRANQUIL  that  there  was  quite  a  large  imbalance  in  the  tree 
structure.  Specifically,  the  initial  BIGTAB  contained  198  entries  consisting 
of  109  keywords  and  89  language  terminals.  A  reflection  on  the  properties  of 
binary  trees  shows  that  in  the  worst  case,  all  nodes  could  be  strung  out", 
requiring  198  levels,  and  in  the  best  case,  eight  levels  (riog2198l).  The 
importance  in  the  number  of  levels  is  in  the  speed  of  lookup--the  more  levels 
to  the  tree,  the  more  comparisons  that  must  be  made  to  find  an  entry  in  the 
tree. 

On  the  198-entry  tree  actually  produced  by  the  syntax  preprocessor,  the 
level  number  of  the  nodes  varied  from  one  (for  the  head  of  the  tree)  to  eigh- 
teen. The  average  level  of  all  198  nodes  was  nine.  For  comparison,  a  fully 
balanced  binary  tree  with  eight  levels  could  contain  255  nodes,  will  have  a 
maximum  level  of  eight,  and  an  average  level  of  7.03. 

An  algorithm  was  developed  by  this  author  that  will  balance  the  tree 
structure  of  any  input  BIGTAB-type  tree,  modifying  the  left  and  right  tree 
pointers,  but  leaving  all  nodes  in  the  same  locations  as  previously.  The  maxi- 
mum level  of  the  balanced  tree  will  be  riog?(N-l)l,  where  N  is  the  total  number 
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of  nodes  in  the  tree  to  be  balanced. 

Before  the  algorithm  is  discussed  in  detail,  some  observations  can  be 
made  about  the  structure  of  the  balanced  tree.  Given  the  nodes  in  lexical 
order  in  the  sequential  table  TAB[1]  to  TAB[N],  the  tree  developed  by  this 
algorithm  will  have  all  odd  TAB  entries,  i.e.,  TAB[1],  TAB[3],  TAB[5],  etc., 
as  terminal  nodes-with  both  left  and  right  tree  pointers  null.  Conversely, 
all  even  TAB  entries,  i.e.,  TAB[2],  TAB[4],  etc.,  will  have  at  least  one  non- 
null  link.  Furthermore,  as  the  tree  is  "grown",  the  left  sub-tree  of  any  node 
will  always  be  complete.  If  the  tree  is  not  full,  it  will  be  the  right  sub- 
trees that  will  be  partially  empty.  Figure  4  illustrates  these  points. 

The  essence  of  the  algorithm  is  to  order  the  nodes  into  a  linear  list 
TAB,  and  then  to  visit  each  node  on  each  level  sequentially,  from  left  to 
right,  computing  and  setting  the  new  BIGTAB  tree  links  as  each  node  is  visited. 
To  control  sequencing  of  the  algorithm,  a  queue  is  constructed,  being  ini- 
tialized both  front  and  rear  with  the  new  head  node.  When  the  right  and  left 
tree  links  are  computed  for  a  node,  the  TAB  address  of  these  sub-nodes  are 
inserted  into  the  rear  of  the  queue.  This  results  in  the  visit  of  all  nodes 
in  a  certain  level  before  progressing  to  the  next  lower  level. 

Use  of  this  procedure  has  been  made  a  control  card  option.  If  "BALANCE" 
appears  on  a  control  card,  the  initial  BIGTAB  is  balanced.  Thereafter,  the 
procedure  may  be  called  from  procedure  TABLESEARCH,  whenever  a  BIGTAB  array 
row  fills  up. 

On  a  series  of  benchmark  tests  using  a  1943-card-image  input  deck,  using 
the  1969  TRANQUIL  BIGTAB,  the  balancing  added  about  2.8  seconds  to  the  two 
minute  total  scan  time.  But  use  of  the  balanced  BIGTAB  was  able  to  increase 
throughput  of  the  scanner  between  six  and  seventeen  percent  over  that  using  the 
unbalanced  BIGTAB. 
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Figure  4.  Examples  of  Balanced  Tree  Structures 
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ALGORITHM  B  -  Balance  the  Tree  Structure 

Let  BIGTABWORD  be  considered  a  pointer  with  bits  [22:13]  as  a  left 
pointer  field,  and  bits  [35:13]  as  a  left  pointer  field.  Let  each  TAB  entry 
have  three  fields:  bits  [15:10]  as  the  queue  link,  bits  [25:10]  as  the  delta 
field,  and  bits  [35:13]  as  the  BIGTAB  address. 

Bl  Traverse  the  tree  in  postorder  (see  Algorithm  T,  Knuth  [6], 
page  317),  using  an  auxiliary  stack,  placing  the  BIGTAB  ad- 
dresses of  the  nodes  visited  into  a  sequential  array  TAB[1] 
to  TAB[N]. 

B2     Find  the  size  of  the  smallest  fully  balanced  tree  that  has 
less  than  or  equal  to  N  nodes.  Let  I  be  this  number,  with  a 
value  1,  3,  7,  15,  31,  2m-l(m>J).  The  exponent  m  is  the 
maximum  number  of  levels  in  the  balanced  tree. 

B3     Set  SHAKEOUTBIGTAB  to  be  the  BIGTAB  address  of  the  root  of 
the  balanced  tree.  This  will  be  found  in  TAB[[I/2]],  for 
example,  if  1=15,  TAB[8]  will  contain  the  BIGTAB  address  of 
the  root  of  the  tree. 

B4  Set  F  «-  R  «-  P  +  [1/21,  the  root  of  the  tree.  Set  the  delta 
field  of  TAB[P]  +•  P/2. 

B5     Is  F=0?  (output  queue  exhausted?).  Yes,  go  to  B13  (exit). 

B6  Set  P  •*■  F  (front  of  queue).  Compute  right  and  left  tree 
pointers  for  node  P.  Set  DELTA  *■  delta  field  of  TAB[P]. 
Set  Q  <-  BIGTAB  address  stored  in  TAB[P]. 

B7     If  DELTA  =  0,  then  node  Pisa  terminal  node  with  both  pointers 
zero.  Set  BIGTABWORD  to  zero.  Go  to  B12. 

B8     DELTA  f   0.  Compute  left  tree  pointer.  Set  LINK  «-  P  -  DELTA. 
Set  left  part  of  BIGTABWORD  to  the  BIGTAB  address  field  of 
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TAB[LINK].  Set  the  delta  field  of  TAB[LINK]  to  DELTA/2  (as  it 
is  on  the  next  lower  level),  and  insert  LINK  into  the  rear  of 
the  output  queue. 

B9     Compute  the  right  tree  pointer.  Set  LINK  «■  P  +  DELTA.  Since 
the  right  sub-tree  may  be  incomplete,  recompute  DELTA.  If 
LINK  >  N,  then  set  DELTA  <-  DELTA/2,  go  to  B9. 

BIO  If  DELTA  =  0,  then  the  right  sub-tree  is  empty.  Set  the  right 
part  of  BIGTABWORD  to  zero.  Since  there  is  no  right  sub-tree, 
nothing  needs  to  be  put  into  the  output  queue.  Go  to  B12. 

Bll     DELTA  JO.     Set  right  part  of  BIGTABWORD  to  the  BIGTAB  address 
field  of  TAB[LINK].  Set  DELTA/2  into  the  delta  field  of 
TAB[LINK].  Insert  LINK  into  the  rear  of  the  output  queue. 

B12    Set  BIGTAB[Q].[22:26]  *-  BIGTABWORD.  Set  F  to  the  next  node 
from  the  front  of  the  queue.  Go  to  B5. 

B13    Exit  procedure.  Entire  tree  is  balanced. 
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APPENDIX  A 


SYNTAX  AND  SEMANTICS  OF  SCANNER-DEFINED  ITEMS 


<*!>,  the  Identifier  Metaclass 

TBNF  SYNTAX 
<LETTER> 
<DECIMAL  DIGIT> 
<ALPHABETIC  CHARACTER> 
ALPHANUMERIC  CHARACTER> 


A | B | C | D  |  ...|Z 
0|1|2|3|4|5|6|7|8|9 

<LETTER>|<ALPHABETIC  CONTROL  CARD  OPTION 
<ALPHABETIC  CHARACTER> |<DECIMAL  DIGIT>| 
<ALPHANUMERIC  CONTROL  CARD  OPTION> 
<ALPHABETIC  CHARACTER> |<*I> 
ALPHANUMERIC  CHARACTER> 


SEMANTICS 

All  identifiers  are  limited  to  48  characters  in  length.  Identifiers 
may  not  extend  over  card-image  boundaries.  The  standard  syntax  of  ALPHABETIC 
CHARACTER>  or  ALPHANUMERIC  CHARACTER>  may  be  augmented  by  control  card  option 
(see  section  2.5) . 

<*R>,  the  Keyword  Metaclass 


TBNF  SYNTAX 


<*R> 


;=  <*I>|"#"  <*I> 


SEMANTICS 

If  the  "RSWD"  control  card  option  is  selected,  a  simple  identifier  may 
be  used  as  a  keyword  -  and  cannot  be  re-declared  by  the  compiler-user.  The 
nominal  state  is  the  "SPWD"  or  special  word  option,  where  a  "#"  must  precede 
the  syntax-defined  keyword  identifier  for  it  to  have  keyword  meaning.  The 
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programmer  may  then  choose  freely  the  identifiers  he  uses  with  no  fear  of 
encountering  reserved  identifiers  he  did  not  know  about. 

<*N>,  the  Numeric  Literal  Metaclass 
TBNF  SYNTAX 


<DECIMAL  INTEGER> 

<DIGIT> 

<INTEGER> 

<RADIX  P0INT> 

FRACTIONAL  PART> 

<EXP0NENT  DELIMITER> 

<EXP0NENT  PART> 

<BASE  LEFT  DELIMITER> 
<BASE  RIGHT  DELIMITER> 
<BASE  PART> 

<REAL> 

<FIXED  P0INT> 

<*N> 


=  list  <DECIMAL  DIGIT> 

=  <LETTER>|<DECIMAL  DIGIT> 

=  <DECIMAL  DIGITxDIGIT>* 
_  M  it 

=  <RADIX  POINTxDIGIT>* 
=  "@" 

=  <EXP0NENT  DELIMITER>[+|-]? 

<DECIMAL  INTEGER> 
_  n  /  ii 

_  ii  \  ii 

=  <BASE  LEFT  DELIMITERxDECIMAL  INTEGER> 

<BASE  RIGHT  DELIMITER> 
=  [<INTEGER>|<FIXED  P0INT>]? 

<EXP0NENT  PART>|  FIXED  P0INT> 
=  <INTEGER><RADIX  POINT><DIGIT>*| 

<RADIX  P0INT>  list  <DIGIT> 
=  [<INTEGER>|<REAL>]<BASE  PART>? 


SEMANTICS 

A  numeric  literal  may  be  split  across  card  images.  The  length  of  a 
numeric  literal  must  not  exceed  the  following  formula: 

I  +  N  <  62,  if  base  is  decimal 
I  +  N  <  61,  for  non-decimal  base, 
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where  I  represents  the  total  number  of  integer  and  fractional  digits,  not 
counting  the  radix  point,  and  N  represents  the  number  of  digits  in  the  normal- 
ized exponent,  not  counting  the  exponent  delimiter.  No  blanks  may  be  embedded 
within  a  numeric  literal.  The  exponent  part  may  not  exceed  the  range  ±  999. 
The  base  part  must  be  in  the  range  2-36. 

EXAMPLES 

INTEGER 

1 

0A3456  (must  start  with  a  decimal  digit) 
FIXED  POINT 

1. 

l.ABCDE 

.34291 
REAL 

023 

1.043 

24897320-728 

77A34Q.9L70+3 
<*N> 

ABCDE(16) 

3.489023(12) 

1011110001110(2) 


<*S>,  the  String  Literal  Metaclass 

TBNF  SYNTAX 

<STRING  B0DY> 
<STRING  QU0TE> 
<*S> 


=  [<ANY>  but  ["  not  "]]* 


=  <STRING  QUOTExSTRING  BODYxSTRING  QUOTE 
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SEMANTICS 

The  definition  of  <STRING  B0DY>  indicates  that  if  the  string  quote 
appears  within  the  body,  it  must  be  double.  When  the  string  is  reduced  to  a 
BIGTAB  entry,  the  redundant  quote  is  deleted.  The  string  literal  must  be 
completely  contained  in  one  card-image,  and  may  not  exceed  48  characters  (not 
counting  redundant  double  string  quotes). 

EXAMPLES 


INPUT (stored  as  "INPUT") 

"EXPAND",  "ILLIAC  IV  TRANSLATOR  WRITING  SYSTEM1 


THE  TWS  MACRO  FACILITY 


TBNF  SYNTAX 


<MACR0  DECLARATION> 

<MACR0  DEFINITION> 

<MACR0  FORMAL  PARAMETER 
PART> 

<MEND> 

<MACR0  INV0CATI0N> 

<MACR0  ACTUAL  PARAMETER 

PART> 


DEFINE  list  <MACR0  DEFINITION> 
separator  ","  ";" 

<*IxMACR0  FORMAL  PARAMETER  PART>  "  =  " 
[<ANY>  but  <MEND>  but  DEFINE]*  <MEND> 

"["  list  <*!>  separator  ","  "]"| 
"("  list  *I  separator  ","  ")" 
"?"|MEND|<C0NTR0L  CARD  MEND> 
<*IxMACR0  ACTUAL  PARAMETER  PART> 

"["  list  <MACR0  ACTUAL  PARAMETER> 
separator  ","  "]"| 
")"  list<  MACRO  ACTUAL  PARAMETER> 
separator  ","  ")" 
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<MACR0  ACTUAL  PARAMETER   ::=  [<ANY>  but.  DEFINE  but. 

<UNBRACKETED  C0MMA>]* 
<UNBRACKETED  C0MMA>        ::=  "," 

SEMANTICS 

The  limitation  that  a  macro  definition  not  contain  "DEFINE"  enforces 
the  rule  that  macro  declarations  not  be  nested.  Macro  calls  may  exist  either 
within  the  defined  text,  or  in  an  actual  parameter.  The  restriction  exists 
that  a  macro  may  not  contain  a  call  on  itself,  either  directly  or  indirectly. 

The  unbracketed  comma  is  a  recognition  that  an  actual  parameter  may 
contain  virtually  any  text.  Specifically,  if  another  macro  call  is  in  the 
actual  parameter  part,  or  a  call  on  a  procedure  containing  its  own  actual 
parameters,  delimited  by  commas,  a  way  must  be  devised  for  defining  the  param- 
eter delimiter  comma.  The  definition  that  has  been  implemented  is: 

<UNBRACKETED  C0MMA>       ::=  A  level  zero  comma  where,  when  the 

initial  "["  or  "("  is  recognized,  the 
level  is  set  to  zero,  and  incremented 
by  one  for  each  subsequent  "["  or  "(" 
and  decremented  by  one  for  each  sub- 
sequent "]"  or  ")". 

This  concept  could  be  expanded  by  modifying  the  procedure  PROCESS- 
MACROACTUALPARAMETERPART  in  a  language-specific  manner  to  accommodate  such 
additional  bracketing  pairs  that  might  occur  in  an  actual  parameter  as 
INTEGER  -  ";"  or  REAL  -  ";". 

<C0NTR0L  CARD  MEND>  is  a  control  card  option  "MACROEND  X",  where  X  may 
be  "MEND"  or  a  single  special  character.  If  "MEND"  appears,  then  this  keyword 
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will  mark  the  end  of  the  macro  definition.  If  a  single  special  character,  then 
that  will  end  the  definition. 

EXAMPLES 

<MACR0  DECLARATION 
$  MACROEND  # 

DEFINE  MACROTABLE(Pl)  =  MACROTAB[(TWSTI  PI). [36:5], 
TWSTI.[41:7]]#,INCR(X)  =  X  +   X+l  #,  ALPHABETIC  =  [42:6]  #  ; 
MACRO  INVOCATION 

.  .  .  MACROTABLE(IF  A+B  Y  THEN  ELSE  (INCR(W)))  .  .  . 
ADD  (A[3,4,7,9,12],  B[4,3]); 
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APPENDIX  B 
SAMPLE  PROGRAM 

In  order  to  illustrate  the  concepts  described  in  this  paper,  we 
consider  a  comprehensive  example.  This  begins  with  the  TBNF  grammar  that 
defined  the  initial  symbol  table  and  proceeds  through  the  balancing  of  the 
symbol  table  at  the  beginning  of  the  compilation  of  a  program  and  through 
the  scanning  of  the  text  of  a  sample  program. 

TBNF  Grammar  for  the  Language  DEMALGOL 

The  following  language  is  used  as  a  TWS  bench  mark.  This  language, 
with  the  addition  of  the  semantic  actions,  is  documented  in  Trout  [4]. 

DEMALGOL 


<PR0GRAM> 
<BL0CK> 


<DECLARATION> 


<STATEMENT> 


::=  <BL0CK>; 

::=  BEGIN  <DECLARATION>*  list  <STATEMENT> 
separator  ";" 
END  ; 

::=  [INTEGER | BOOLEAN | LABEL] 

[list  <*!>  separator  ","|<ERR0R>; 

::=  [<LABEL>  " : "]*[[60  T0|G0T0]<LABEL> | 
IF  <B00LEAN>  THEN  <STATEMENT> | 
IF  <B00LEAN>  THEN  <STATEMENT> 
[ELSE  <STATEMENT>]?| 
<VARIABLE>  [":  =  "|"«-"]  <VALUE>  | 
<>  [ahead  ";" [ahead  END | ahead  ELSE] | 
<ERR0R>; 
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::=  <*I> 

::=  list  ARITHMETIC  PRIMARY>  separator 

["+"|"-"|"x"|"/"]; 
::=  <*I>|"("  BOOLEAN  ")"| 

<VALUE>["="|,7"|"<"|">"  "<"  ">"] 
<VALUE>; 
=  "("  VALUE  ")"|<VARIABLE>|<*N>; 
=  <*I>; 
=  <ANY>[<ANY>  but  ";"  but  END]*; 


END 


BIGTAB  as  Initialized  by  the  Syntax  Preprocessor 

Figure  5  represents  the  BIGTAB  as  produced  by  the  syntax  preprocessor  for 
DEMALGOL,  with  DEFINE  and  MEND  added  during  the  initialization  of  the  run-time 
compiler.  As  to  format,  for  the  head  nodes,  four  fields  are  defined:  semantic 
part  containing  the  preprocessor-assigned  symbol  number  for  each  keyword  (in 
decimal),  the  length  field  indicating  the  number  of  words  (indexed  from  zero), 
the  number  of  valid  characters  in  the  last  word  (indexed  from  one),  and  the 
right  and  left  pointer  fields  (in  decimal).  For  sake  of  clarity,  the  non- 
terminals that  would  be  inserted  in  the  symbol  table  are  omitted  from  this 
example. 


SAMPLE  DEMALGOL  PROGRAM 

$RSWD  LIST  ALPHANUMERIC  -  BALANCE 

%   SOLUTION  OF  RIGHT  TRIANGLES  WITH  SIMULATED  I/O 

%   AFTER  AN  EXAMPLE  IN  THE  MAD  PRIMER  [9] 


001 
002 
003 
004 
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BEGIN  005 

INTEGER  A,  B,  C,  X,  TEMP  006 

LABEL  GOOD-RIGHT-TRIANGLE,  NOT-A-RIGHT-TRIANGLE,  007 

READ-LOOP  008 

DEFINE  ABS  (  ARG,  ANS  )  =  009 

TEMP  «■  ARG;  010 

IF  TEMP  <  0  THEN  Oil 

ANS  ^  0  -  TEMP  012 

ELSE  ANS  <-  TEMP  ?  ,  013 

GO-TO-GOOD  =  014 

IF  X  <  0.1  THEN  GO  TO  GOOD-RIGHT-TRIANGLE  ?  ;  015 

READ-LOOP:    %     READ  A,  B,  C;  WRITE  A,  B,  C                 016 

ABS  (  ((AxA)  +  (BxB))  -  (C*C),  X);  017 

GO-TO-GOOD;  018 

ABS  (  ((BxB)  +  (CxC))  -  (AxA),  X);  019 

GO-TO-GOOD;  020 

ABS  (  ((CxC)  +  (AxA))  -  (BxB),  X)  021 

GO-TO-GOOD;  022 

023 

NOT-A-RIGHT-TRIANGLE:  024 

%     WRITE  "NOT  A  RIGHT  TRIANGLE"  025 

GOTO  READ-LOOP;  026 

027 

GOOD-RIGHT -TRIANGLE:  028 

%  WRITE  "GOOD  RIGHT  TRIANGLE"  029 

GO  TO  READ-LOOP;  030 

END  031 
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ACTION  BY  THE  SCANNER  ON  THIS  PROGRAM 

This  section  will  trace  the  major  actions  of  the  scanner  on  text 
lines  1-18  of  the  above  program,  with  the  following  notes  on  format: 

a)  SCAN  descriptors  are  shown  with  five  fields,  representing 
the  class,  symbol  number  or  semantic  part,  BIGTAB  pointer 
or  symbol  number,  class  and  BIGTAB  pointer  or  symbol  number. 
The  values  are  in  decimal. 

b)  BIGTAB  header  words  are  shown  with  four  fields:  semantic 
part,  number  of  words/characters,  left  tree  pointer,  right 
tree  pointer.  In  the  case  of  the  semantic  part  of  macro 
definitions  or  parameters,  the  semantic  part  is  represented 
by  (a/b/c),  where  a_  is  bit  [1:1],  the  macro  flag,  _b  is  bit 
[2:1],  the  formal  parameter  flag,  and  c_  is  bit  [3:12],  the 
macro  address  or  parameter  number. 

c)  BIGTAB  data  words  are  shown  as  they  appear  in  section  2.2, 
i.e.,  in  character  notation. 

d)  MACROTAB  entries  are  shown  with  various  fields,  as  described 
in  section  2.4.3.  If  a  field  is  designated  as  unused,  it 
will  not  be  listed  below. 

$RSWD   LIST   ALPHANUMERIC  -   BALANCE 001_ 

CHARCLASS[#]  =  0 

SPSTYP  =0  (i.e.,  reserved  word  option  rather  than  special) 

CHARCLASS[-].  44  =  1 

(See  Figure  7  for  the  results  of  balancing  the  tree) 

BIGTAB[1]  =  33  (links  initial  balanced  BIGTAB  to  <*I>  tree) 

BIGTAB[15]  =  0 
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Lines  2,  3  and  4  cause  no  output  by  the  scanner  other  than  the  listing. 


BEGIN 


005 


*SCAN  descri 

iptor 

15/66/24/15/66 

BEGIN 

INTEGER  A,  B,  C, 

,  X,  TEMP 

006 

*SCAN  descri 

iptor 

15/68/34/15/68 

INTEGER 

BIGTAB[51] 

= 

0/01/0/0 

BIGTAB[52] 

= 

00A 

BIGTAB[43] 

= 

77/03/51/0 

(link  to  AND) 

*SCAN  descri 

ptor 

1/0/51/1/51 

A 

*SCAN  descri 

ptor 

15/58/58/15/58 

> 

BIGTAB[53] 

= 

0/01/0/0 

BIGTAB[54] 

= 

00  B 

BIGTAB[43] 

= 

77/03/51/53 

(link  to  AND) 

*SCAN  descri 

ptor 

1/0/53/1/53 

B 

*SCAN  descri 

ptor 

15/58/58/15/58 

» 

BIGTAB[55] 

= 

0/01/0/0 

BIGTAB[56] 

= 

OOC 

BIGTAB[26] 

= 

69/11/0/55 

(link  to  BOOLEAN) 

*SCAN  descri 

ptor 

1/0/55/1/55 

C 

*SCAN  descri 

ptor 

15/58/58/15/58 

j 

BIGTAB[57] 

= 

0/01/0/0 

BIGTAB[58] 

= 

OOX 

BIGTAB[33] 

= 

72/02/31/57 

(link  to  TO) 

*SCAN  descri 

ptor 

1/0/57/1/57 

X 

*SCAN  descri 

ptor 

15/58/58/15/58 

» 

BIGTAB[59] 

= 

0/04/0/0 
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BIGTAB[60] 
BIGTAB[39] 
*SCAN  descriptor 
LABEL  GOOD-RIGHT-TRIANGLE, 


OOTEMP 

75/04/59/0        (link  to  THEN) 
1/0/59/1/59       TEMP 
NOT-A-RIGHT-TRIANGLE, 


017 


*SCAN  descr 

iptor 

15/70/29/15/70 

LABEL 

BIGTAB[61] 

= 

0/31/0/0 

BIGTAB[62] 

= 

00G00D-R 

BIGTAB[63] 

= 

OOIGHT-T 

BIGTAB[64] 

= 

OORIANGL 

BIGTAB[65] 

= 

00E 

BIGTAB[35] 

= 

73/04/61/0 

(link  to  GOTO) 

*SCAN  descri 

iptor 

1/0/61/1/61 

GOOD-RIGHT-TRIANGLE 

*SCAN  descri 

ptor 

15/58/58/15/58 

> 

BIGTAB[66] 

= 

0/32/0/0 

BIGTAB[67] 

= 

00N0T-A- 

BIGTAB[68] 

= 

OORIGHT- 

BIGTAB[69] 

= 

OOTRIANG 

BIGTAB[70] 

= 

OOLE 

BIGTAB[59] 

= 

0/04/66/0 

(link  to  TEMP) 

*SCAN  descri 

ptor 

1/0/66/1/66 

NOT-A-RIGHT-TRIANGLE 

*SCAN  descri 

ptor 

15/58/58/15/58 

> 

READ-LOOP 

008 

BIGTAB[71] 
BIGTAB[72] 
BIGTAB[73] 
BIGTAB[66] 


0/13/0/0 
OOREAD-L 
OOOOP 
0/32/0/71 


(link  to  NOT-A-RIGHT...) 


50 


*SCAN  descriptor 
DEFINE  ABS  (  ARG,  ANS  )  = 


1/0/71/1/71 


READ-LOOP 


009 


BIGTAB[74] 

BIGTAB[75] 

BIGTAB[51] 

MACR0TAB[1] 

BIGTAB[76] 

BIGTAB[77] 

BIGTAB[53] 

MACR0TAB[4095  = 

BIGTAB[78] 

BIGTAB[79] 

BIGTAB[76] 

MACR0TAB[4094]  = 

TEMP  +■  ARG; 


(l/0/0)/03/0/0 

OOABS 

0/01/0/74 

0/74/0/0 

(l/l/0)/03/0/0 

OOARG 

0/01/76/0 

0/77 

(l/l/D/03/0/0 

OOANS 

(l/l/0)/03/78/0 

0/79 


(link  to  A) 

(set  up  head  word  2) 


(link  to  B) 

(formal  parameter  save) 


(link  to  ARG) 
(formal  parameter  save) 

010 


MACR0TAB[2] 
MACR0TAB[3] 
MACR0TAB[4] 
MACR0TAB[5] 
IF  TEMP  <  0  THEN 


1/0/59/1/59 
15/31/31/15/31 
10/0/0 
15/46/46/15/46 


(scan  of  TEMP) 
(scan  of  «-) 
(formal  parameter) 
(scan  of  ;) 


on 


MACR0TAB[6] 

MACR0TAB[7] 

MACR0TAB[8] 

BIGTAB[77] 

BIGTAB[78] 

BIGTAB[2] 


15/74/37/15/74 

1/0/59/1/59 

15/30/30/15/30 

0/06/0/0 

01000000 

0/0/0/77 


(scan  of  IF) 
(scan  of  TEMP) 
(scan  of  <) 


(start  numeric  tree) 
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MACR0TAB[9] 
MACR0TAB[10] 
ANS  +   0  -  TEMP 


2/0/77/2/77 
15/75/39/15/75 


(scan  of  0) 
(scan  of  THEN) 


012 


MACR0TAB[11]  = 
MACR0TAB[12]  = 
MACR0TAB[13]  = 
MACR0TAB[14]  = 
MACR0TAB[15]  = 
ELSE  ANS  +   TEMP 


10/0/1 

15/31/31/15/31 
2/0/77/2/77 
15/44/44/15/44 

1/0/59/1/59 


(formal  parameter  ANS) 
(scan  of  -*-) 
(scan  of  0) 
(scan  of  -) 
(scan  of  TEMP) 

013 


MACR0TAB[16] 

MACR0TAB[17] 

MACR0TAB[18] 

MACR0TAB[19] 

MACR0TAB[20] 

BIGTAB[78] 

BIGTAB[76] 

MACR0TAB[0] 

GO-TO-GOOD 


15/76/41/15/76 

10/0/1 

15/31/31/15/31 

1/0/59/1/59 

15/14/14/15/14 

0/03/0/0 

0.03/0/0 

20/0/2 


(scan  of  ELSE) 

(formal  parameter  ANS) 

(scan  of  «-) 

(scan  of  TEMP) 

(scan  of  ?) 

(reset  BIGTAB  semantic-ARG) 

(reset  BIGTAB  semantic-ANS) 

(set  up  head  word  1) 

014 


BIGTAB[79] 
BIGTAB[80] 
BIGTAB[81] 
BIGTAB[61] 
MACR0TAB[22]  = 
IF  X  <  0.1  THEN  GO 


(l/l/21)/14/0/0 
00G0-T0- 
00G00D 

0/31/79/0        (link  to  GOOD-RIGHT...) 
0/79/0/0         (set  up  head  word  2) 
TO  GOOD-RIGHT-TRIANGLE  ?  :  015 


MACR0TAB[23] 
MACR0TAB[24] 


15/74/37/15/74    (scan  of  IF) 
1/0/57/1/57      (scan  of  X) 
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MACR0TAB[25] 
BIGTAB[82] 
BIGTAB[83] 
BIGTAB[77] 
MACR0TAB[26] 
MACR0TAB[27] 
MACR0TAB[28] 
MACR0TAB[29] 
MACR0TAB[30] 
MACR0TAB[31] 
MACR0TAB[21] 
READ-LOOP:     %     READ  A, 


15/30/30/15/30 
0/06/0/0 
: 2011 000 
0/06/82/0 
2/0/82/2/82 
15/75/39/15/75 
15/71/31/15/31 
15/72/33/15/33 
1/0/61/1/61 
15/14/14/15/14 
31/0/0 
B,  C;  WRITE  A,  B,  C 


(scan  of  <) 

(numeric  0.1 ) 
(link  to  0) 
(scan  of  0.1 ) 
(scan  of  THEN) 
(scan  of  GO) 
(scan  of  TO) 

(scan  of  GOOD-RIGHT-...) 
(scan  of  ?) 

(set  macro  head  word  1) 

016 


*SCAN  descriptor    1/0/71/1/71 
*SCAN  descriptor    15/13/13/15/13 
ABS  (  ((AxA)  +  (BxB))  -  (OC),  X)   ; 


READ-LOOP 


017 


MACR0TAB[0] 

MACR0TAB[33] 

MACR0TAB[34] 
MACR0TAB[35] 
MACR0TAB[36] 
MACR0TAB[37] 
MACR0TAB[38] 
MACR0TAB[39] 
MACR0TAB[40] 
MACR0TAB[41] 


20/32/2 

15/29/29/15/29 

15/29/29/15/29 

1/0/51/1/51 

15/32/32/15/32 

1/0/51/1/51 

15/45/45/15/45 

15/16/16/15/16 

15/29/29/15/29 

1/0/53/1/53 


(set  actual  parameter 
address  in  head  word  1) 
(scan  of  (,  begin 
actual  parameter) 
(scan  of  (  ) 
(scan  of  A) 
(scan  of  x) 
(scan  of  A) 
(scan  of  )   ) 
(scan  of  +) 
(scan  of  (  ) 
(scan  of  B) 
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MACR0TAB[42]  = 

MACR0TAB[43]  = 

MACR0TAB[44]  = 

MACR0TAB[45]  = 

MACR0TAB[46]  = 

MACR0TAB[47]  = 

MACR0TAB[48]  = 

MACR0TAB[49]  = 

MACR0TAB[50]  = 

MACR0TAB[51]  = 

MACR0TAB[52]  = 

MACR0TAB[32]  = 

MACR0TAB[53]  = 

riACR0TAB[54]  = 

MACR0TAB[32]  = 

MACR0TAB[20]  = 

*SCAN  descriptor 
*SCAN  descriptor 

MACR0TAB[52]  = 
*SCAN  descriptor 


15/32/32/15/32 

1/0/53/1/53 

15/45/45/15/45 

15/45/45/15/45 

14/44/44/15/44 

15/29/29/15/29 

1/0/55/1/55 

15/32/32/15/32 

1/0/55/1/55 

15/45/45/15/45 

15/58/58/15/58 

19/33/0/0 

1/0/57/1/57 
15/45/45/15/45 

19/33/1/53 

9/0/0 

1/0/59/1/59 
15/31/31/15/31 
9/5/0 
15/29/29/15/29 


(scan  of  x) 

(scan  of  B) 

(scan  of  )  ) 

(scan  of  )  ) 

(scan  of  -) 

(scan  of  (  ) 

(scan  of  C) 

(scan  of  x) 

(scan  of  C) 

(scan  of  )  ) 

(scan  of  parameter 

delimiter  comma) 

(actual  parameter 

addresses  and  lengths) 

(scan  of  X,  parameter  2) 

(scan  of  ) ,  end  of 

parameter  part) 

(actual  parameter 

addresses  and  lengths) 

(set  outermost  return 

descriptor) 

(TEMP,  from  MACR0TAB[2]) 

(  ,  from  MACR0TAB[3]) 

(set  parameter  1  return) 

(  (,  from  MACR0TAB[33], 

actual  parameter  table) 
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*SCAN  descriptor 
*SCAN  descriptor 


15/45/45/15/45 
15/46/46/15/46 


(  ),  from  MACR0TAB[51]) 
(;,  from  MACR0TAB[5]) 


*SCAN  descriptor  15/75/39/15/75 

MACR0TAB[54]   =  9/12/0 

*SCAN  descriptor  1/0/57/1/57 

*SCAN  descriptor  15/31/31/15/31 


(THEN,  from  MACR0TAB[10]) 
(set  parameter  2  return) 
(X,  from  MACR0TAB[53], 
actual  parameter  table) 
(<-,  from  MACR0TAB[12]) 


*SCAN  descriptor 
MACR0TAB[54]  = 
*SCAN  descriptor 
*SCAN  descriptor 
*SCAN  descriptor 
*SCAN  descriptor 
G0-T0-G00D: 


15/76/41/15/76 

9/18/0 

1/0/57/1/57 

15/31/31/15/31 

1/0/59/1/59 

15/46/46/15/46 


(ELSE,  from  MACR0TAB[16]) 
(parameter  2  return  word) 
(X,  from  MACR0TAB[53]) 
(<-,  from  MACR0TAB[18]) 
(TEMP,  from  MACR0TAB[19]) 
( ; ,  from  card  17) 

018 


MACR0TAB[31]   = 
*SCAN  descriptor 


9/0/0 
15/74/37/15/74 


(set  outermost  return) 
(IF,  from  MACR0TAB[23]) 


*SCAN  descriptor     1/0/61/1/61 


*SCAN  descriptor 


15/46/46/15/46 


(G00D-RIGHT-TRIANGLE; 
from  MACR0TAB[30]) 
(;,  from  card  18) 
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